Search results for "Data stream mining"

showing 10 items of 35 documents

Hyperspectral dimensionality reduction for biophysical variable statistical retrieval

2017

Abstract Current and upcoming airborne and spaceborne imaging spectrometers lead to vast hyperspectral data streams. This scenario calls for automated and optimized spectral dimensionality reduction techniques to enable fast and efficient hyperspectral data processing, such as inferring vegetation properties. In preparation of next generation biophysical variable retrieval methods applicable to hyperspectral data, we present the evaluation of 11 dimensionality reduction (DR) methods in combination with advanced machine learning regression algorithms (MLRAs) for statistical variable retrieval. Two unique hyperspectral datasets were analyzed on the predictive power of DR + MLRA methods to ret…

010504 meteorology & atmospheric sciencesMean squared errorComputer science0211 other engineering and technologies02 engineering and technologycomputer.software_genre01 natural sciencessymbols.namesakeLinear regressionComputers in Earth SciencesEngineering (miscellaneous)Gaussian processHyMap021101 geological & geomatics engineering0105 earth and related environmental sciencesData stream miningbusiness.industryDimensionality reductionHyperspectral imagingPattern recognitionAtomic and Molecular Physics and OpticsComputer Science ApplicationsKernel (statistics)symbolsData miningArtificial intelligencebusinesscomputerISPRS Journal of Photogrammetry and Remote Sensing
researchProduct

Summarizing the state of the terrestrial biosphere in few dimensions

2020

Abstract. In times of global change, we must closely monitor the state of the planet in order to understand the full complexity of these changes. In fact, each of the Earth's subsystems – i.e., the biosphere, atmosphere, hydrosphere, and cryosphere – can be analyzed from a multitude of data streams. However, since it is very hard to jointly interpret multiple monitoring data streams in parallel, one often aims for some summarizing indicator. Climate indices, for example, summarize the state of atmospheric circulation in a region. Although such approaches are also used in other fields of science, they are rarely used to describe land surface dynamics. Here, we propose a robust method to crea…

0106 biological sciences010504 meteorology & atmospheric sciencesAtmospheric circulationlcsh:Life0207 environmental engineering02 engineering and technology010603 evolutionary biology01 natural scienceslcsh:QH540-549.5Cryosphere020701 environmental engineeringEcology Evolution Behavior and Systematics0105 earth and related environmental sciencesEarth-Surface ProcessesData stream mininglcsh:QE1-996.5BiosphereGlobal change15. Life on landAlbedolcsh:Geologylcsh:QH501-531Arctic13. Climate actionClimatologyEnvironmental sciencelcsh:EcologyHydrosphere
researchProduct

A Methodology to Derive Global Maps of Leaf Traits Using Remote Sensing and Climate Data

2018

This paper introduces a modular processing chain to derive global high-resolution maps of leaf traits. In particular, we present global maps at 500 m resolution of specific leaf area, leaf dry matter content, leaf nitrogen and phosphorus content per dry mass, and leaf nitrogen/phosphorus ratio. The processing chain exploits machine learning techniques along with optical remote sensing data (MODIS/Landsat) and climate data for gap filling and up-scaling of in-situ measured leaf traits. The chain first uses random forests regression with surrogates to fill gaps in the database (> 45% of missing entries) and maximizes the global representativeness of the trait dataset. Plant species are then a…

0106 biological sciencesFOS: Computer and information sciences010504 meteorology & atmospheric sciencesSpecific leaf areaClimateBos- en LandschapsecologieSoil ScienceFOS: Physical sciencesApplied Physics (physics.app-ph)010603 evolutionary biology01 natural sciencesStatistics - ApplicationsGoodness of fitAbundance (ecology)Machine learningForest and Landscape EcologyApplications (stat.AP)Computers in Earth SciencesPlant ecologyVegetatie0105 earth and related environmental sciencesRemote sensingMathematics2. Zero hungerPlant traitsVegetationData stream miningClimate; Landsat; Machine learning; MODIS; Plant ecology; Plant traits; Random forests; Remote sensing; Soil Science; Geology; Computers in Earth SciencesGlobal MapRegression analysisGeologyPhysics - Applied Physics15. Life on landRandom forestsRemote sensingPE&RCRandom forestMODISTraitVegetatie Bos- en LandschapsecologieVegetation Forest and Landscape EcologyLandsat
researchProduct

Towards Quantifying Non-Photosynthetic Vegetation for Agriculture Using Spaceborne Imaging Spectroscopy

2021

Non-photosynthetic vegetation (NPV) has been identified as priority variable in the context of new spaceborne imaging spectroscopy missions. In this study we provide a first attempt to quantify NPV biomass from these unprecedented data streams to be provided by multiple recently launched or planned instruments. A hybrid workflow is proposed including Gaussian process regression (GPR) trained over radiative transfer model (RTM) simulations and applying active learning strategies. A soybean field data set including two dates with NPV measurements on yellow and senescent (brown) plant organs was used for model validation, resulting in relative errors of 13.4%. This prototype retrieval model wa…

2. Zero hunger010504 meteorology & atmospheric sciencesData stream mining0211 other engineering and technologiesEnMAPHyperspectral imagingContext (language use)PRISMA02 engineering and technologyVegetationVegetation functional trait01 natural sciencesLigninImaging spectroscopyAtmospheric radiative transfer codesWorkflowHybrid approacheCHIMEKrigingEnvironmental scienceCelluloseGaussian process regression021101 geological & geomatics engineering0105 earth and related environmental sciencesRemote sensing
researchProduct

Earth system data cubes unravel global multivariate dynamics

2020

Understanding Earth system dynamics in light of ongoing human intervention and dependency remains a major scientific challenge. The unprecedented availability of data streams describing different facets of the Earth now offers fundamentally new avenues to address this quest. However, several practical hurdles, especially the lack of data interoperability, limit the joint potential of these data streams. Today, many initiatives within and beyond the Earth system sciences are exploring new approaches to overcome these hurdles and meet the growing interdisciplinary need for data-intensive research; using data cubes is one promising avenue. Here, we introduce the concept of Earth system data cu…

Agriculture and Food SciencesDECOMPOSITION0106 biological sciencesFLUXESDependency (UML)lcsh:Dynamic and structural geology010504 meteorology & atmospheric sciencesInterface (Java)Computer scienceDIMENSIONALITY010603 evolutionary biology01 natural sciencesESAData cube03 medical and health scienceslcsh:QE500-639.5TEMPERATURE SENSITIVITYlcsh:Science030304 developmental biology0105 earth and related environmental sciences0303 health sciencesData stream mininglcsh:QE1-996.5SCIENCEFRAMEWORKData sciencePRODUCTSlcsh:GeologyMODELEarth system scienceVariable (computer science)Workflow13. Climate actionGeneral Earth and Planetary Scienceslcsh:QSOIL RESPIRATIONCurse of dimensionality
researchProduct

Sequential Mining Classification

2017

Sequential pattern mining is a data mining technique that aims to extract and analyze frequent subsequences from sequences of events or items with time constraint. Sequence data mining was introduced in 1995 with the well-known Apriori algorithm. The algorithm studied the transactions through time, in order to extract frequent patterns from the sequences of products related to a customer. Later, this technique became useful in many applications: DNA researches, medical diagnosis and prevention, telecommunications, etc. GSP, SPAM, SPADE, PrefixSPan and other advanced algorithms followed. View the evolution of data mining techniques based on sequential data, this paper discusses the multiple …

Apriori algorithmComputer sciencebusiness.industryData stream miningConcept mining02 engineering and technologycomputer.software_genreMachine learningGSP AlgorithmTree (data structure)Statistical classificationComputingMethodologies_PATTERNRECOGNITION020204 information systems0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingData miningArtificial intelligencebusinessK-optimal pattern discoverycomputerFSA-Red Algorithm2017 International Conference on Computer and Applications (ICCA)
researchProduct

Modeling Multi-label Recurrence in Data Streams

2019

Most of the existing data stream algorithms assume a single label as the target variable. However, in many applications, each observation is assigned to several labels with latent dependencies among them, which their target function may change over time. Classification of such non-stationary multi-label streaming data with the consideration of dependencies among labels and potential drifts is a challenging task. The few existing studies mostly cope with drifts implicitly, and all learn models on the original label space, which requires a lot of time and memory. None of them consider recurrent drifts in multi-label streams and particularly drifts and recurrences visible in a latent label spa…

Change over timeMulti-label classificationData streambusiness.industryComputer scienceData stream miningSpace dimensionPattern recognitionComputingMethodologies_PATTERNRECOGNITIONStreaming dataArtificial intelligencebusinessClassifier (UML)Decoding methods2019 IEEE International Conference on Big Knowledge (ICBK)
researchProduct

On the Online Classification of Data Streams Using Weak Estimators

2016

In this paper, we propose a novel online classifier for complex data streams which are generated from non-stationary stochastic properties. Instead of using a single training model and counters to keep important data statistics, the introduced online classifier scheme provides a real-time self-adjusting learning model. The learning model utilizes the multiplication-based update algorithm of the Stochastic Learning Weak Estimator (SLWE) at each time instant as a new labeled instance arrives. In this way, the data statistics are updated every time a new element is inserted, without requiring that we have to rebuild its model when changes occur in the data distributions. Finally, and most impo…

Complex data typeTraining setLearning automataComputer sciencebusiness.industryData stream miningEstimator020206 networking & telecommunications02 engineering and technologycomputer.software_genreMachine learning0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingData miningArtificial intelligencebusinesscomputerClassifier (UML)Juncture
researchProduct

Moving Learning Machine Towards Fast Real-Time Applications: A High-Speed FPGA-based Implementation of the OS-ELM Training Algorithm

2018

Currently, there are some emerging online learning applications handling data streams in real-time. The On-line Sequential Extreme Learning Machine (OS-ELM) has been successfully used in real-time condition prediction applications because of its good generalization performance at an extreme learning speed, but the number of trainings by a second (training frequency) achieved in these continuous learning applications has to be further reduced. This paper proposes a performance-optimized implementation of the OS-ELM training algorithm when it is applied to real-time applications. In this case, the natural way of feeding the training of the neural network is one-by-one, i.e., training the neur…

Computer Networks and CommunicationsComputer scienceReal-time computingParameterized complexitylcsh:TK7800-836002 engineering and technologyextreme learning machine0202 electrical engineering electronic engineering information engineeringSensitivity (control systems)Electrical and Electronic EngineeringEnginyeria d'ordinadorsField-programmable gate arrayFPGAExtreme learning machineEnginyeria elèctricaArtificial neural networkData stream mininglcsh:Electronics020206 networking & telecommunicationsOS-ELMreal-time learningHardware and ArchitectureControl and Systems Engineeringon-chip trainingSignal Processingon-line learning020201 artificial intelligence & image processingDistributed memoryonline sequential ELMhardware implementationAlgorithm
researchProduct

Efficient anomaly detection on sampled data streams with contaminated phase I data

2020

International audience; Control chart algorithms aim to monitor a process over time. This process consists of two phases. Phase I, also called the learning phase, estimates the normal process parameters, then in Phase II, anomalies are detected. However, the learning phase itself can contain contaminated data such as outliers. If left undetected, they can jeopardize the accuracy of the whole chart by affecting the computed parameters, which leads to faulty classifications and defective data analysis results. This problem becomes more severe when the analysis is done on a sample of the data rather than the whole data. To avoid such a situation, Phase I quality must be guaranteed. The purpose…

Computer scienceSample (material)0211 other engineering and technologies02 engineering and technology[INFO.INFO-SE]Computer Science [cs]/Software Engineering [cs.SE]01 natural sciences[INFO.INFO-IU]Computer Science [cs]/Ubiquitous Computing010104 statistics & probabilitysymbols.namesake[INFO.INFO-CR]Computer Science [cs]/Cryptography and Security [cs.CR]ChartControl chartEWMA chart0101 mathematics021103 operations researchData stream miningbusiness.industryPattern recognition[INFO.INFO-MO]Computer Science [cs]/Modeling and Simulation[INFO.INFO-MA]Computer Science [cs]/Multiagent Systems [cs.MA]OutliersymbolsAnomaly detection[INFO.INFO-ET]Computer Science [cs]/Emerging Technologies [cs.ET]Artificial intelligence[INFO.INFO-DC]Computer Science [cs]/Distributed Parallel and Cluster Computing [cs.DC]businessGibbs sampling
researchProduct